Consensus methods for DNA and protein sequence alignment.

نویسندگان

M S Waterman

R Jones

چکیده

Introduction The increasing body of nucleic acid sequence data has created interest among many scientists in computational approaches to macromolecular sequence analysis. Several international databases have been created in C141 order to store the data in a useful format, both for archival and analysis purposes.' Both DNA and protein sequences databases are maintained. The value of simply having easy access to all membrane protein sequences, for example, is not to be underestimated. The quantity of data has naturally led to the development of computer approaches to sequence analysis .* The purpose of this chapter is to present some of the tools that we have created in order to analyze multiple sequences in a rigorous, efficient, and systematic way. Much computer analysis of molecular sequences is directed toward discovery of biologically significant patterns. These patterns include ho-mologous genes, RNA secondary structure, tRNA or structural RNAs, palindromes in DNA sequences, regulatory patterns in promoter regions, and protein structural patterns. Once the patterns have been located they can often be tested by experiment, as in the case of promoter elements. Evolutionary relationships, however, cannot be directly tested, and increasing emphasis is being attached to the discovery and interpretation of sequence evolution. , Sequence alignment is a popular approach to pattern analysis.2 Computer alignments are often based on an explicit optimization function, rewarding matches and penalizing mismatches, insertions, and deletions. Sequence alignment often gives useful information about evolutionary or functional relationships between sequences. Our approach is based on what we refer to as consensus a n a l y ~ i s. ~-~ Consensus sequence analysis is usually performed by visual inspection of the sequences and by experiment. Of course, a protein binding site can only be verified by experiment, and analysis by " eye " can be biased. Thus, it is useful to have computer methods that can find consensus patterns best fitting explicitly stated criteria. Some algorithms have been developed along these line^,^-^ and they are described here, along with some biological examples. Our earlier methods applied only to DNA; here we also describe recent extensions to protein sequences. In 1970 Needleman and Wunsch5 published an approach sequence comparison (alignment) using a dynamic programming algorithm. Their algorithm find' maximum similarity between two sequences, where matches score positive weight and mismatches, insertions, and deletions 223 score nonpositive weight. Mathematicians began to attempt to define a distance between sequences and so to construct a …

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

gpALIGNER: A Fast Algorithm for Global Pairwise Alignment of DNA Sequences

Bioinformatics, through the sequencing of the full genomes for many species, is increasingly relying on efficient global alignment tools exhibiting both high sensitivity and specificity. Many computational algorithms have been applied for solving the sequence alignment problem. Dynamic programming, statistical methods, approximation and heuristic algorithms are the most common methods appli...

متن کامل

Dengue virus type-3 envelope protein domain III; expression and immunogenicity

Objective(s): Production of a recombinant and immunogenic antigen using dengue virus type-3 envelope protein is a key point in dengue vaccine development and diagnostic researches. The goals of this study were providing a recombinant protein from dengue virus type-3 envelope protein and evaluation of its immunogenicity in mice. Materials and Methods: Multiple amino acid sequences of different i...

متن کامل

A Probabilistic Approach to a Consensus Multiple Alignment

We consider the problem of obtaining the maximum a posteriori probability (MAP) estimate of a consensus ancestral sequence for a set of DNA sequences. Our maximization method, called ASA (dnA Sequence Alignment), can be applied to the refinement of noisy regions of a DNA assembly, to the alignment of genomic functional sites, or to the alignment of any set of DNA sequences related by a star-lik...

متن کامل

A MODEL FOR THE BASIC HELIX- LOOPHELIX MOTIF AND ITS SEQUENCE SPECIFIC RECOGNITION OF DNA

A three dimensional model of the basic Helix-Loop-Helix motif and its sequence specific recognition of DNA is described. The basic-helix I is modeled as a continuous ?-helix because no ?-helix breaking residue is found between the basic region and the first helix. When the basic region of the two peptide monomers are aligned in the successive major groove of the cognate DNA, the hydrophobi...

متن کامل

Sequence-specific reconstruction from fragmentary databases using seed sequences: implementation and validation on SAGE, proteome and generic sequencing data

MOTIVATION DNA assembly programs classically perform an all-against-all comparison of reads to identify overlaps, followed by a multiple sequence alignment and generation of a consensus sequence. If the aim is to assemble a particular segment, instead of a whole genome or transcriptome, a target-specific assembly is a more sensible approach. GenSeed is a Perl program that implements a seed-driv...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

Methods in enzymology

دوره 183 شماره

صفحات -

تاریخ انتشار 1990

Consensus methods for DNA and protein sequence alignment.

نویسندگان

چکیده

منابع مشابه

gpALIGNER: A Fast Algorithm for Global Pairwise Alignment of DNA Sequences

Dengue virus type-3 envelope protein domain III; expression and immunogenicity

A Probabilistic Approach to a Consensus Multiple Alignment

A MODEL FOR THE BASIC HELIX- LOOPHELIX MOTIF AND ITS SEQUENCE SPECIFIC RECOGNITION OF DNA

Sequence-specific reconstruction from fragmentary databases using seed sequences: implementation and validation on SAGE, proteome and generic sequencing data

عنوان ژورنال:

اشتراک گذاری